Geospatial analysis in GeoPandas

Further learning: https://autogis-site.readthedocs.io/en/latest/index.html

Norway shapefile: https://kartkatalog.geonorge.no/metadata/administrative-enheter-fylker/6093c8a8-fa80-11e6-bc64-92361f002671

GeoPandas, as the name suggests, extends the popular data science library pandas by adding support for geospatial data. If you are not familiar with pandas, we recommend taking a quick look at its Getting started documentation before proceeding.

The core data structure in GeoPandas is the geopandas.GeoDataFrame, a subclass of pandas.DataFrame, that can store geometry columns and perform spatial operations. The geopandas.GeoSeries, a subclass of pandas.Series, handles the geometries. Therefore, your GeoDataFrame is a combination of pandas.Series, with traditional data (numerical, boolean, text etc.), and geopandas.GeoSeries, with geometries (points, polygons etc.). You can have as many columns with geometries as you wish; there’s no limit typical for desktop GIS software.

Each GeoSeries can contain any geometry type (you can even mix them within a single array) and has a GeoSeries.crs attribute, which stores information about the projection (CRS stands for Coordinate Reference System). Therefore, each GeoSeries in a GeoDataFrame can be in a different projection, allowing you to have, for example, multiple versions (different projections) of the same geometry.

Only one GeoSeries in a GeoDataFrame is considered the active geometry, which means that all geometric operations applied to a GeoDataFrame operate on this active column.

Reading files

Assuming you have a file containing both data and geometry (e.g. GeoPackage, GeoJSON, Shapefile), you can read it using geopandas.read_file(), which automatically detects the filetype and creates a GeoDataFrame. This tutorial uses the "nybb" dataset, a map of New York boroughs, which is part of the GeoPandas installation. Therefore, we use geopandas.datasets.get_path() to retrieve the path to the dataset.

Writing files

To write a GeoDataFrame back to file use GeoDataFrame.to_file(). The default file format is Shapefile, but you can specify your own with the driver keyword.

Built-in functions

Projections

Each GeoSeries has its Coordinate Reference System (CRS) accessible at GeoSeries.crs. The CRS tells GeoPandas where the coordinates of the geometries are located on the earth’s surface. In some cases, the CRS is geographic, which means that the coordinates are in latitude and longitude. In those cases, its CRS is WGS84, with the authority code EPSG:4326. Let’s see the projection of our Norway GeoDataFrame.

Measure distance between points:

Note that geopandas.GeoDataFrame is a subclass of pandas.DataFrame, so we have all the pandas functionality available to use on the geospatial dataset — we can even perform data manipulations with the attributes and geometry information together.

For example, to calculate the average of the distances measured above, access the distance column and call the mean() method on it:

Making maps

GeoPandas can also plot maps, so we can check how the geometries appear in space. To plot the active geometry, call GeoDataFrame.plot(). To color code by another column, pass in that column as the first argument. In the example below, we plot the active geometry column and color code by the "area" column. We also want to show a legend (legend=True).

Let's plot E6 traffic registration stations on the map of Norway

Geojson data from Vegkart

A lot of interesting open data is available from Statens Vegvesen via Vegkart: https://vegkart.atlas.vegvesen.no/

All available data is listed in the Datakatalog: https://datakatalogen.vegdata.no/

We now learn how to import this data into Geopandas. There is a Python library available to extract data from Vegkart: https://github.com/LtGlahn/nvdbapi-V3. However, version 3 of this package does not include the functionality to convert the data to GeoJSON format. I've extracted that code from version 2 of nvdbapi and imported it into my own fork of version 3: https://github.com/alexdiem/nvdbapi-V3. We will use this code to import Vegkart data into GeoPandas.

The data saved in Vegkart is encoded into numerical IDs according to Datakatalog: